We explore the effect of COVID-19 pandemic on crime rates across different cities in the US to get a glimpse into the future of crime in these unprecedented circumstances.
Our goal is to use COVID and Crime data to make future predictions on crime rates in this pandemic. We also look at the crime rates before and after the pandemic to gain insight into how quarantine and self-isolation measures influence human behaviour leading to an increase in specific types of crime.
Click on the location markers on the left to view their analysis !
Some text desccription
Some text desccription
This is text link
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
This is text link
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
This is text link
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
---
title: "COVID-19 and US Crime"
author: ""
output:
flexdashboard::flex_dashboard:
theme: flatly
orientation: rows
social: menu
source: embed
---
```{r setup, include=FALSE}
library(ggplot2)
library(plotly)
library(plyr)
library(flexdashboard)
library(RSocrata)
library(tidyverse)
library(tsbox) # transform data into time series
library(xts)
library(COVID19) # to get data about covid 19
library(forecast) #ariRI model
library(vars) #VAR and Causality
library(dygraphs)
library(leaflet)
library(htmlwidgets)
#[INCOMPLETE] Graphs in Chicago have been converted here to Plotly HTML Widgets
# Make some noisily increasing data [Testing Purposes]
set.seed(955)
dat <- data.frame(cond = rep(c("A", "B"), each=10),
xvar = 1:20 + rnorm(20,sd=3),
yvar = 1:20 + rnorm(20,sd=3))
#### COVID19 DATA ####
#Load Chicago Data
covid19_CH <- covid19("USA", level = 3) %>%
# this cook county contains chicago
filter(administrative_area_level_3 == "Cook",
administrative_area_level_2 == "Illinois" ) %>%
# filter out days when confirmed is zero or one
# becasue when it was 2 for a very long time
filter(confirmed > 2)
# Load Providence data
covid19_RI <- covid19("USA", level = 2) %>%
filter(administrative_area_level_2 == "Rhode Island") %>%
# filter out days when confirmed is zero or one
# becasue when it was 1 for a very long time
filter(confirmed > 1)
## March 07 has 140 confirmed case which is impossible.
## Google shows that date had still 3 cumulative case
## Manual adjustment on row 5
covid19_RI$confirmed[5] = covid19_RI$confirmed[4]
# Load Boston Data
covid19_MA <- covid19("USA", level = 2) %>%
filter(administrative_area_level_2 == "Massachusetts") %>%
# filter out days when confirmed is zero or one
# becasue when it was 1 for a very long time
filter(confirmed > 1)
# Load Atlanta Data
covid19_AT <- covid19("USA", level = 3) %>%
filter(administrative_area_level_2 == "Georgia",
administrative_area_level_3 == 'Fulton') %>%
# filter out days when confirmed is zero or one
# becasue when it was 2 for a very long time
filter(confirmed > 2)
```
Overview {.storyboard}
=======================================================================
### Introduction
```{r}
Location <- c("Providence ","Los Angeles","Chicago","Boston","Seattle","Atlanta","Philadelphia" )
lat <- c(41.8240,33.82377,41.78798,42.361145,47.714965,33.753746, 39.952583)
lng <- c( -71.4128,-118.2668,-87.7738,-71.057083,-122.127166 ,-84.386330,-75.165222)
df <- data.frame(Location, lat,lng)
jsCode <- paste0('
function(el, x) {
var marker = document.getElementsByClassName("leaflet-marker-icon leaflet-zoom-animated leaflet-interactive");
marker[0].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#providence");};
marker[1].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#los-angeles");};
marker[2].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#chicago");};
marker[3].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#boston");};
marker[4].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#seattle");};
marker[5].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#atlanta");};
marker[6].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#philadelphia");};
}
')
m <- leaflet() %>%
addTiles() %>% # Add default OpenStreetMap map tiles
addMarkers(data=df,)%>%
onRender(jsCode)
m # Print the map
```
***
We explore the effect of COVID-19 pandemic on crime rates across different cities in the US to get a glimpse into the future of crime in these unprecedented circumstances.
Our goal is to use COVID and Crime data to make future predictions on crime rates in this pandemic. We also look at the crime rates before and after the pandemic to gain insight into how quarantine and self-isolation measures influence human behaviour leading to an increase in specific types of crime.
Click on the location markers on the left to view their analysis !
### Key Results
```{r}
```
***
Some text desccription
### Coming Soon
```{r}
```
***
Some text desccription
Chicago
=======================================================================
Row{data-height=300}
-------------------------------------
### Impulse response function (criminal damage)
```{r get the data for chiacgo}
if (exists("chicago")) is.data.frame(get("chicago")) else chicago <- RSocrata::read.socrata(
"https://data.cityofchicago.org/resource/ijzp-q8t2.csv?$where=year >= 2014",
app_token = "hPU78MH7zKApdpUv4OVCInPOQ")
```
```{r}
# add date
chicago <- chicago %>%
mutate(Date = as.Date(substr(date, start = 1, stop = 10))) %>%
mutate(y_month = substr(date, start = 1, stop = 7)) %>%
mutate(month = substr(date, start = 6, stop = 7))
# summary of all crime
chicago_summary <- chicago %>%
group_by(primary_type) %>%
summarise(number_of_crime = n()) %>%
arrange(desc(number_of_crime))
```
```{r extract chicago crime}
# extract top 5 crime
top5crime <- chicago %>%
filter(primary_type %in% head(chicago_summary$primary_type, 5)) %>%
group_by(Date, primary_type) %>%
tally() %>%
spread(primary_type, n)
# rename columns
colnames(top5crime) <- c('time',
"assault",
"battery",
"criminal",
'deceptive',
"theft")
top5crime <- na.omit(top5crime)
# create time series
top5crime_xts <- ts_xts(top5crime[,1:2])
for (i in (3:ncol(top5crime))){
temp_xts <- ts_xts(top5crime[, c(1,i)])
top5crime_xts <- merge(top5crime_xts, temp_xts)
}
# extract difference, change per day
top5crime_diff <- na.omit(diff(top5crime_xts))
```
```{r covid 19 related exploration, message=F, warning=F}
# extract for tranforming into time series data
ts_CH <- covid19_CH %>%
dplyr::select(date, confirmed) %>%
ts_xts()
# try first log difference
ts_diff_CH <- na.omit(diff(ts_CH))
covid19_CH_diff <- data.frame(diff(covid19_CH$confirmed))
colnames(covid19_CH_diff)[1] = "confirmed"
covid19_CH_diff$date = covid19_CH$date[2:length(covid19_CH$date)]
# time as integer
covid19_CH_diff$timeInt = as.numeric(covid19_CH_diff$date)
# make a copy to avoid perfect collinearity
covid19_CH_diff$timeIid = covid19_CH_diff$timeInt
# GAMM model
# 50 too overfit. 15 looks decent
gamCH <- gamm4::gamm4(confirmed ~ s(timeInt, k=50), random = ~(1|timeIid),
data=covid19_CH_diff, family=poisson(link='log'))
toPredict = data.frame(time = seq(covid19_CH_diff$date[1],
covid19_CH_diff$date[length(covid19_CH_diff$date)],
by = '1 day'))
toPredict$timeInt = as.numeric(toPredict$time)
# obtain forecast
forecast <- data.frame(exp(do.call(cbind, mgcv::predict.gam(gamCH$gam, toPredict, se.fit=TRUE))))
# access residuals
CH_res <- data.frame(covid19_CH_diff$confirmed - forecast$fit)
# transform into time series
CH_res$time = covid19_CH_diff$date
colnames(CH_res)[1] = "residuals"
col_order <- c("time", "residuals")
CH_res <- CH_res[, col_order]
CH_res_ts <- ts_xts(CH_res)
```
```{r chicago top 5 crime VAR}
# specify common time range
# start from when covid was a thing
# end on May 25, to avoid effect of protests related to George Floyid.
common_time <- seq.Date(start(CH_res_ts), as.Date("2020-05-25"), by = "day")
# combine time series of crime and covid
combined_diff <- merge(top5crime_diff[paste(common_time[1],
common_time[length(common_time)],
sep = "/")],
CH_res_ts[paste(common_time[1],
common_time[length(common_time)],
sep = "/")])
```
```{r construct chicago var}
optimal_assault <- VARselect(na.omit(combined_diff)[,c(1,6)], type = 'none', lag.max = 10)
optimal_battery <- VARselect(na.omit(combined_diff)[,c(2,6)], type = 'none', lag.max = 10)
optimal_criminal <- VARselect(na.omit(combined_diff)[,c(3,6)], type = 'none', lag.max = 10)
optimal_deceptive <- VARselect(na.omit(combined_diff)[,c(4,6)], type = 'none', lag.max = 10)
optimal_theft <- VARselect(na.omit(combined_diff)[,c(5,6)], type = 'none', lag.max = 10)
VAR_assault <- VAR(y=as.ts(na.omit(combined_diff)[,c(1,6)]), p=optimal_assault$selection[1])
VAR_battery <- VAR(y=as.ts(na.omit(combined_diff)[,c(2,6)]), p=optimal_battery$selection[1])
VAR_criminal <- VAR(y=as.ts(na.omit(combined_diff)[,c(3,6)]),
p=optimal_criminal$selection[1])
VAR_deceptive <- VAR(y=as.ts(na.omit(combined_diff)[,c(4,6)]),
p=optimal_deceptive$selection[1])
VAR_theft <- VAR(y=as.ts(na.omit(combined_diff)[,c(5,6)]), p=optimal_theft$selection[1])
```
```{r, warning=F, message=F}
vehicle <- chicago %>%
filter(primary_type == 'MOTOR VEHICLE THEFT')%>%
group_by(Date, primary_type) %>%
tally() %>%
spread(primary_type, n)
colnames(vehicle) <- c('time', 'vehicle')
vehicle_xts <- ts_xts(na.omit(vehicle)[,1:2])
vehicle_diff <- na.omit(diff(vehicle_xts))
combined_diff2 <- merge(vehicle_diff[paste(common_time[1],
common_time[length(common_time)],
sep = "/")],
CH_res_ts[paste(common_time[1],
common_time[length(common_time)],
sep = "/")])
optimal_vehicle <- VARselect(na.omit(combined_diff2)[,c(1,2)], type = 'none', lag.max = 10)
VAR_vehicle <- VAR(y=as.ts(na.omit(combined_diff2)[,c(1,2)]), p=optimal_vehicle$selection[1])
```
```{r chicago irf}
# use car theft and criminal damage
par(mfrow = c(1,2))
lags = c(1:25)
# criminal damange
# only significant one is from covid to crime
irf_criminal_2 <- irf(VAR_criminal,
impulse = "residuals",
response = "criminal",
n.ahead = 24,
ortho = F)
# ggplot version of it.
irf_criminal_2_gg <- data.frame(
irf_criminal_2$irf$residuals[,1],
irf_criminal_2$Lower$residuals[,1],
irf_criminal_2$Upper$residuals[,1]
)
colnames(irf_criminal_2_gg) <- c("mean", "lower", "upper")
i1 <- ggplot(irf_criminal_2_gg, aes(x = lags)) +
geom_line(aes(y = mean), color = "black") +
geom_hline(yintercept = 0, color = "blue") +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "lightblue", alpha = 0.5) +
theme_classic() +
ggtitle("How many more criminal damage cases per day there will be
after 1 confirmed COVID-19 case") +
xlab("Number of days after 1 COVID-19 case")+
ylab("Number of criminal damange per day")
ggplotly(i1)
```
### Impulse response function (vehicle theft)
```{r}
# vehicle theft
# only significant one is from covid to crime
irf_vehicle_2 <- irf(VAR_vehicle,
impulse = "residuals",
response = "vehicle",
n.ahead = 24)
# ggplot version of it
irf_vehicle_2_gg <- data.frame(
irf_vehicle_2$irf$residuals[,1],
irf_vehicle_2$Lower$residuals[,1],
irf_vehicle_2$Upper$residuals[,1]
)
colnames(irf_vehicle_2_gg) <- c("mean", "lower", "upper")
i2 <- ggplot(irf_vehicle_2_gg, aes(x = lags)) +
geom_line(aes(y = mean), color = "black") +
geom_hline(yintercept = 0, color = "blue") +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "lightblue", alpha = 0.5) +
theme_classic() +
ggtitle("How many more vehicle theft cases per day there will be
after 1 confirmed COVID-19 case") +
xlab("Number of days after 1 COVID-19 case")+
ylab("Number of vehicle theft per day")
ggplotly(i2)
```
Row {data-height=400 .tabset .tabset-fade }
-------------------------------------
### Chicago Crime Daily Summary
```{r chicago daily freq}
# daily
# 2020 only
chicago_daily <- chicago %>%
dplyr::select(Date, primary_type, year) %>%
filter(primary_type %in% chicago_summary$primary_type[1:5] | primary_type == "MOTOR VEHICLE THEFT",
year == 2020) %>%
count(Date, primary_type) %>%
na.omit() %>%
ggplot(aes(Date, n, group = primary_type, color = primary_type)) +
geom_line() +
facet_wrap(~primary_type, scale = "free") +
scale_fill_brewer(palette = "Set1", breaks = rev(levels(chicago_summary$primary_type[1:5]))) +
ggtitle("Frequency of top 5 crime in Chicago in 2020 (Jan to June)")
ggplotly(chicago_daily)
```
### Chicago Crime Summary
```{r}
# year to year comparison
plt <- chicago %>%
dplyr::select(y_month, month, primary_type, year) %>%
filter(primary_type %in% chicago_summary$primary_type[1:5] | primary_type == "MOTOR VEHICLE THEFT", y_month != "2020-06") %>%
count(year, month, primary_type) %>%
na.omit()%>%
ggplot(aes(x=month, y=n, group = year, color = as.character(year))) +
geom_line() +
facet_wrap(~primary_type, nrow = 1) +
guides(color = guide_legend(reverse = TRUE)) +
xlab('Month') +
ylab('Cases') +
labs(col='Year')
ggplotly(plt) %>%
layout(legend=list(traceorder='reversed'))
```
### VAR forecast for criminal damage
```{r}
interval_value_formatter <- "function(num, opts, seriesName, g, row, col) {
value = g.getValue(row, col);
if(value[0] != value[2]) {
lower = Dygraph.numberValueFormatter(value[0], opts);
upper = Dygraph.numberValueFormatter(value[2], opts);
return '[' + lower + ', ' + upper + ']';
} else {
return Dygraph.numberValueFormatter(num, opts);
}
}"
```
```{r}
f_criminal <- forecast::forecast(VAR_criminal)
f_criminal$forecast$criminal %>%
{cbind(actuals=.$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = 'Prediction on how many more criminal damage cases
compared to yesterday',
ylab = 'Day-to-day change') %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(show = 'follow') %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```
### VAR forecast for vehicle theft
```{r}
f_vehicle <- forecast::forecast(VAR_vehicle)
f_vehicle$forecast$vehicle %>%
{cbind(actuals=.$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = 'Prediction on how many more vehicle theft cases
compared to yesterday',
ylab = 'Day-to-day change') %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(show = 'follow') %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```
### VAR forecast for assault
```{r}
f_assault <- forecast::forecast(VAR_assault)
f_assault$forecast$assault %>%
{cbind(actuals=.$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = 'Prediction on how many more assault cases
compared to yesterday',
ylab = 'Day-to-day change') %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(show = 'follow') %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```
### VAR forecast for battery
```{r}
f_battery <- forecast::forecast(VAR_battery)
f_battery$forecast$battery %>%
{cbind(actuals=.$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = 'Prediction on how many more battery cases
compared to yesterday',
ylab = 'Day-to-day change') %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(show = 'follow') %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```
### VAR forecast for deceptive
```{r}
f_deceptive <- forecast::forecast(VAR_deceptive)
f_deceptive$forecast$deceptive %>%
{cbind(actuals=.$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = 'Prediction on how many more deceptive practice cases
compared to yesterday',
ylab = 'Day-to-day change') %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(show = 'follow') %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```
### VAR forecast for theft
```{r}
f_theft <- forecast::forecast(VAR_theft)
f_theft$forecast$theft %>%
{cbind(actuals=.$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = 'Prediction on how many more theft cases
compared to yesterday', ylab = 'Day-to-day change') %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(show = 'follow') %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```
Row {data-height=300}
-----------------------------------------------------------------------
### Daily confirmed cases of COVID19 in Chicago
```{r}
# if (exists("chicago")) is.data.frame(get("chicago")) else chicago <- RSocrata::read.socrata(
# "https://data.cityofchicago.org/resource/ijzp-q8t2.csv?$where=year >= 2014",
# app_token = "hPU78MH7zKApdpUv4OVCInPOQ")
#
#
# # add date
# chicago <- chicago %>%
# mutate(Date = substr(date, start = 1, stop = 10)) %>%
# mutate(y_month = substr(date, start = 1, stop = 7)) %>%
# mutate(month = substr(date, start = 6, stop = 7))
#
# # summary of all crime
# chicago_summary <- chicago %>%
# group_by(primary_type) %>%
# summarise(number_of_crime = n()) %>%
# arrange(desc(number_of_crime))
#
# # looks life theft is seeing sharp drop
#
# # year to year comparison
# plt <- chicago %>%
# dplyr::select(y_month, month, primary_type, year) %>%
# filter(primary_type %in% chicago_summary$primary_type[1:5]) %>%
# count(year, month, primary_type) %>%
# na.omit()%>% ggplot(aes(x=month, y=n, group = year, color = as.character(year))) + geom_line() + facet_free(~primary_type,scales = "free", space = "free")+xlab("Month") +
# ylab("Cases") + theme(legend.title = element_blank())
#
# ggplotly(plt)
#TEST CHUNK UNCOMMENT TO REDUCE FILE RUN TIME [Design]
# n <- 20
# x1 <- rnorm(n); x2 <- rnorm(n)
# y1 <- 2 * x1 + rnorm(n)
# y2 <- 3 * x2 + (2 + rnorm(n))
# A <- as.factor(rep(c(1, 2), each = n))
# df <- data.frame(x = c(x1, x2), y = c(y1, y2), A = A)
# fm <- lm(y ~ x + A, data = df)
#
# p <- ggplot(data = cbind(df, pred = predict(fm)), aes(x = x, y = y, color = A))
# p <- p + geom_point() + geom_line(aes(y = pred))
# ggplotly(p)
dygraph(ts_diff_CH)%>%
dyOptions(fillGraph = TRUE, fillAlpha = 0.4, colors = "red")%>%
dySeries("cd7b965f", label = "Chicago")%>%
dyLegend(show = "always", hideOnMouseOut = FALSE)
```
### Summary
This is text [link](http://www.example.com)
- one
- two
- three
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Providence
=======================================================================
Row{data-height=300 .tabset .tabset-fade}
-------------------------------------
```{r get data for providence}
if (exists("providence")) is.data.frame(get("providence")) else providence <- read.socrata(
"https://data.providenceri.gov/resource/rz3y-pz8v.csv",
app_token = "hPU78MH7zKApdpUv4OVCInPOQ")
# add date
providence <- providence %>%
mutate(date = as.Date(substr(reported_date, start = 1, stop = 10))) %>%
mutate(y_month = substr(reported_date, start = 1, stop = 7)) %>%
# extract most vague name of crime
separate(offense_desc, c("crime", "detail_crime"), sep = ",")
providence_summary <- providence %>%
group_by(crime) %>%
summarise(number_of_crime = n()) %>%
arrange(desc(number_of_crime))
```
```{r extract crime case for providence, echo=FALSE}
#### Providence crime extract ####
# extract top 5 cases
top5crime <- providence %>%
filter(crime %in% head(providence_summary$crime,5)) %>%
group_by(date, crime) %>%
tally() %>%
spread(crime, n)
# replace NA with 0
top5crime[is.na(top5crime)] = 0
# rename columns
colnames(top5crime) <- c("time",
"assault",
"larceny",
"larceny_from_vehicle",
"missing",
"vandalism")
# create date
top5crime$time <- as.Date(top5crime$time,
format = "%Y-%m-%d")
# create time series
top5crime_xts <- ts_xts(top5crime[,1:2])
for (i in (3:ncol(top5crime))){
temp_xts <- ts_xts(top5crime[, c(1,i)])
top5crime_xts <- merge(top5crime_xts, temp_xts)
}
# extract difference, change per day
top5crime_diff <- na.omit(diff(top5crime_xts))
```
```{r covid 19 Providence, message=FALSE, warning=FALSE}
#### COVID19 RHODE ISLAND ####
# extract for transforming into time series data
ts_RI <- covid19_RI %>%
dplyr::select(date, confirmed) %>%
ts_xts()
# plot daily cases
# first difference
ts_diff_RI <- diff(ts_RI)
# since the end goal is to get residuals
# can add 2 to all values so that there is no negative value
# while having residuals unchanged
adj_diff_RI <- na.omit(ts_diff_RI[,1])
# construct data frame of difference, not time series
covid19_RI_diff <- data.frame(diff(covid19_RI$confirmed))
colnames(covid19_RI_diff)[1] = "confirmed"
covid19_RI_diff$date = covid19_RI$date[2:length(covid19_RI$date)]
# time as integer
covid19_RI_diff$timeInt = as.numeric(covid19_RI_diff$date)
# RIke a copy to avoid perfect collinearity for mixed effect
covid19_RI_diff$timeIid = covid19_RI_diff$timeInt
# GAMM model
gamRI <- gamm4::gamm4(confirmed ~ s(timeInt, k = 90),
random = ~(1|timeIid),
data = covid19_RI_diff,
family = poisson(link = 'log'))
# obtain fitted value
toPredict = data.frame(time = seq(covid19_RI_diff$date[1],
covid19_RI_diff$date[length(covid19_RI_diff$date)],
by = '1 day'))
toPredict$timeInt = as.numeric(toPredict$time)
# obtain forecast
forecast_covid <- data.frame(exp(do.call(cbind, mgcv::predict.gam(gamRI$gam, toPredict, se.fit=TRUE))))
# access residuals
RI_res <- data.frame(covid19_RI_diff$confirmed - forecast_covid$fit)
# transform into time series
RI_res$time = covid19_RI_diff$date
colnames(RI_res)[1] = "residuals"
col_order <- c("time", "residuals")
RI_res <- RI_res[, col_order]
RI_res_ts <- ts_xts(RI_res)
```
```{r providence top 5 crime VAR, echo=FALSE}
#### Build VAR for Providence
# specify common time range
# start from when covid was a thing
# end with the date of the death of George Floyid
common_time <- seq.Date(start(RI_res_ts), as.Date("2020-05-25") , by = "day")
# combine time series of crime and covid
combined_diff <- merge(top5crime_diff[paste(common_time[1],
common_time[length(common_time)],
sep = "/")],
RI_res_ts[paste(common_time[1],
common_time[length(common_time)],
sep = "/")])
# construct VAR
# variable selection based on AIC
optimal_assault <- VARselect(combined_diff[,c(1,6)], type = 'none', lag.max = 10)
optimal_larceny <- VARselect(combined_diff[,c(2,6)], type = 'none', lag.max = 10)
optimal_vehicle <- VARselect(combined_diff[,c(3,6)], type = 'none', lag.max = 10)
optimal_missing <- VARselect(combined_diff[,c(4,6)], type = 'none', lag.max = 10)
optimal_vandalism <- VARselect(combined_diff[,c(5,6)], type = 'none', lag.max = 10)
# use AIC as selection criteria
VAR_assault <- VAR(y=as.ts(combined_diff[,c(1,6)]), p=optimal_assault$selection[1])
VAR_larceny <- VAR(y=as.ts(combined_diff[,c(2,6)]), p=optimal_larceny$selection[1])
VAR_vehicle <- VAR(y=as.ts(combined_diff[,c(3,6)]), p=optimal_vehicle$selection[1])
VAR_missing <- VAR(y=as.ts(combined_diff[,c(4,6)]), p=optimal_missing$selection[1])
VAR_vandalism <- VAR(y=as.ts(combined_diff[,c(5,6)]), p=optimal_vandalism$selection[1])
```
### Impulse response function (Larceny from vehicle 1)
```{r irf providence larceny from vehicle to covid}
par(mfrow = c(1,2))
lags = c(1:25)
# larceny from vehicle
# crime tto covid
irf_vehicle1 <- irf(VAR_vehicle,
impulse = "larceny_from_vehicle",
response = "residuals",
n.ahead = 24)
# ggplot version of irf
irf_vehicle_1_gg <- data.frame(irf_vehicle1$irf$larceny_from_vehicle[,1],
irf_vehicle1$Lower$larceny_from_vehicle[,1],
irf_vehicle1$Upper$larceny_from_vehicle[,1])
colnames(irf_vehicle_1_gg) <- c("mean", "lower", "upper")
irf_vechile_1_plot <- ggplot(irf_vehicle_1_gg, aes(x = lags)) +
geom_line(aes(y = mean), color = "black") +
geom_hline(yintercept = 0, color = "blue") +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "lightblue", alpha = 0.5) +
theme_classic() +
ggtitle("How many more daily covid19 cases there will be
after 1 larceny from vehicle") +
xlab("Number of days after a larceny from vehicle")+
ylab("Number of new covid 19 cases")
# html version
# from crime to covid
ggplotly(irf_vechile_1_plot)
```
### Impulse response function (Larceny from vehicle 2)
```{r irf providence covid to larceny vehicle}
# larceny from vehicle
# covid to crime
irf_vehicle2 <- irf(VAR_vehicle,
impulse = "residuals",
response = "larceny_from_vehicle",
n.ahead = 24)
# ggplot version of irf
irf_vehicle_2_gg <- data.frame(irf_vehicle2$irf$residuals[,1],
irf_vehicle2$Lower$residuals[,1],
irf_vehicle2$Upper$residuals[,1])
colnames(irf_vehicle_2_gg) <- c("mean", "lower", "upper")
irf_vechile_2_plot <- ggplot(irf_vehicle_2_gg, aes(x = lags)) +
geom_line(aes(y = mean), color = "black") +
#geom_line(aes(y = upper), color = "red", linetype = "dashed") +
#geom_line(aes(y = lower), color = "red", linetype = "dashed") +
geom_hline(yintercept = 0, color = "blue") +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "lightblue", alpha = 0.5) +
theme_classic() +
ggtitle("How many more larceny from vechile cases per day there will be
after 1 confirmed covid19 case") +
xlab("Number of days after a confimed covid 19 case")+
ylab("Number of larceny from vechile")
# this one not really significant at a bit above 0.05 and the impact is not even -1.0. So could be okay to leave this out as well.
# html version
# from crime to covid
ggplotly(irf_vechile_2_plot)
```
### Impulse response function (Assault)
```{r irf providence assault}
irf_assault_1 <- irf(VAR_assault,
impulse = "residuals",
response = "assault",
n.ahead = 24)
irf_assault_1_gg <- data.frame(irf_assault_1$irf$residuals[,1],
irf_assault_1$Lower$residuals[,1],
irf_assault_1$Upper$residuals[,1])
colnames(irf_assault_1_gg) <- c("mean", "lower", "upper")
irf_assault_1_plot <- ggplot(irf_assault_1_gg, aes(x=lags)) +
geom_line(aes(y = mean), color = "black") +
geom_hline(yintercept = 0, color = "blue") +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "lightblue", alpha = 0.5) +
theme_classic() +
ggtitle("How many more assault cases per day there will be
after 1 confirmed covid19 case") +
xlab("Number of days after a confimed covid 19 case") +
ylab("Number of assault cases")
ggplotly(irf_assault_1_plot)
```
### Impulse response function (Vandalism)
```{r irf providence vandalism}
# vandalism significant to covid
irf_vandalism_1 <- irf(VAR_vandalism,
impulse = "vandalism",
response = "residuals",
n.ahead = 24)
# ggplot version
irf_vandalism_1_gg <- data.frame(irf_vandalism_1$irf$vandalism[,1],
irf_vandalism_1$Lower$vandalism[,1],
irf_vandalism_1$Upper$vandalism[,1])
colnames(irf_vandalism_1_gg) <- c("mean", "lower", "upper")
irf_vandalism_1_plot <- ggplot(irf_vandalism_1_gg, aes(x = lags)) +
geom_line(aes(y = mean), color = "black") +
geom_hline(yintercept = 0, color = "blue") +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "lightblue", alpha = 0.5) +
theme_classic() +
ggtitle("How many more daily covid19 cases there will be
after 1 vandalism") +
xlab("Number of days after a vandalism")+
ylab("Number of new covid 19 cases")
ggplotly(irf_vandalism_1_plot)
```
### Impulse response function (Missing Person)
```{r irf providence missing}
# covid significant to missing
irf_missing_1 <- irf(VAR_missing,
impulse = "residuals",
response = "missing",
n.ahead = 24)
irf_missing_1_gg <- data.frame(irf_missing_1$irf$residuals[,1],
irf_missing_1$Lower$residuals[,1],
irf_missing_1$Upper$residuals[,1])
colnames(irf_missing_1_gg) <- c("mean", "lower", "upper")
irf_missing_1_plot <- ggplot(irf_missing_1_gg, aes(x=lags)) +
geom_line(aes(y = mean), color = "black") +
geom_hline(yintercept = 0, color = "blue") +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "lightblue", alpha = 0.5) +
theme_classic() +
ggtitle("How many more missing person cases per day there will be
after 1 confirmed covid19 case") +
xlab("Number of days after a confimed covid 19 case") +
ylab("Number of missing person cases")
ggplotly(irf_missing_1_plot)
```
Row {data-height=400 .tabset .tabset-fade }
-------------------------------------
### Providence Crime Daily Summary
```{r providence daily freq}
# daily
providence_daily <- providence %>%
dplyr::select(date, crime) %>%
filter(crime %in% head(providence_summary$crime, 5)) %>%
count(date, crime) %>%
ggplot(aes(date, n, group = crime, color = crime)) +
geom_line() +
facet_free(~crime) +
scale_fill_brewer(palette = "Set1", breaks = rev(levels(providence_summary$crime[1:5]))) +
ggtitle("Daily requency of top 5 crime in Providence in the past 180 days") +
theme(legend.position = "none")
ggplotly(providence_daily)
```
### VAR forecast for Assault
```{r providence assault var}
# assault
# significant
forecast_assault <- forecast::forecast(VAR_assault)
forecast_assault$forecast$assault %>%
{cbind(actuals = .$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = "Daily forecast for day-to-day change in assault cases in Providence",
ylab = "Day-to-day change") %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(labelsSeparateLines=TRUE) %>%
dyRangeSelector() %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1])) %>%
dyLegend(show = "follow")
```
### VAR forecast for Larceny
```{r providence larceny var}
# larceny
# not significant to larceny
forecast_larceny <- forecast::forecast(VAR_larceny)
forecast_larceny$forecast$larceny %>%
{cbind(actuals = .$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = "Daily forecast for day-to-day change in larceny cases in Providence",
ylab = "Day-to-day change") %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(labelsSeparateLines=TRUE) %>%
dyRangeSelector() %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1])) %>%
dyLegend(show = "follow")
```
### VAR forecast for Larceny from Vehicle
```{r providence larceny from vehicle var}
forecast_vehicle <- forecast::forecast(VAR_vehicle)
forecast_vehicle$forecast$larceny_from_vehicle %>%
{cbind(actuals = .$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = "Daily forecast for day-to-day change in larceny from vechile cases in Providence",
ylab = "Day-to-day change") %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(labelsSeparateLines=TRUE) %>%
dyRangeSelector() %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1])) %>%
dyLegend(show = "follow")
```
### VAR forecast for Missing Person
```{r providence missing person var}
# missing person
# significant
forecast_missing <- forecast::forecast(VAR_missing)
forecast_missing$forecast$missing %>%
{cbind(actuals = .$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = "Daily forecast for day-to-day change in missing person cases in Providence",
ylab = "Day-to-day change") %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(labelsSeparateLines=TRUE) %>%
dyRangeSelector() %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1])) %>%
dyLegend(show = "follow")
```
Row {data-height=300}
-----------------------------------------------------------------------
### Daily confirmed cases of COVID19 in Rhode Islands
```{r rhode island daily covid case}
dygraph(ts_diff_RI) %>%
dyOptions(fillGraph = TRUE, fillAlpha = 0.4, colors = "red") %>%
dySeries("41224d53", label = "Rhode Island") %>%
dyLegend(show = 'always', hideOnMouseOut = FALSE)
```
### Summary
This is text [link](http://www.example.com)
- one
- two
- three
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Boston
=======================================================================
Row{data-height=300}
-------------------------------------
```{r get data for boston, echo=FALSE}
#### Boston data ####
if (exists("boston")) is.data.frame(get("boston")) else boston <- read.csv("https://data.boston.gov/dataset/6220d948-eae2-4e4b-8723-2dc8e67722a3/resource/12cb3883-56f5-47de-afa5-3b1cf61b257b/download/tmpqy9o_jgd.csv")
# add date
boston <- boston %>%
mutate(date = as.Date(substr(OCCURRED_ON_DATE, start = 1, stop = 10))) %>%
mutate(y_month = substr(OCCURRED_ON_DATE, start = 1, stop = 7)) %>%
mutate(MONTH = substr(date, start = 6, stop = 7))
# summary of all crime
boston_summary <- boston %>%
group_by(OFFENSE_DESCRIPTION) %>%
summarise(number_of_crime = n()) %>%
arrange(desc(number_of_crime))
```
```{r extract case for boston, echo=FALSE}
#### Boston crime extract ####
# extract top 5 crime
top5crime <- boston %>%
filter(OFFENSE_DESCRIPTION %in% head(boston_summary$OFFENSE_DESCRIPTION, 5)) %>%
group_by(date, OFFENSE_DESCRIPTION) %>%
tally() %>%
spread(OFFENSE_DESCRIPTION, n)
# rename columns
colnames(top5crime) <- c("time",
"investigate",
"property damage",
"medical",
"vandalism",
"dispute")
# create date
top5crime$time <- as.Date(top5crime$time,
format = "%Y-%m-%d")
# create time series
top5crime_xts <- ts_xts(top5crime[,1:2])
for (i in (3:ncol(top5crime))){
temp_xts <- ts_xts(top5crime[, c(1,i)])
top5crime_xts <- merge(top5crime_xts, temp_xts)
}
# extract difference, change per day
top5crime_diff <- na.omit(diff(top5crime_xts))
```
```{r covid 19 Boston, message=FALSE, warning=FALSE}
#### COVID 19 BOSTON ####
# extract for transforming into time series data
ts_MA <- covid19_MA %>%
dplyr::select(date, confirmed) %>%
ts_xts()
# first difference
ts_diff_MA <- na.omit(diff(ts_MA))
# construct data frame of difference, not time series
covid19_MA_diff <- data.frame(diff(covid19_MA$confirmed))
colnames(covid19_MA_diff)[1] = "confirmed"
covid19_MA_diff$date = covid19_MA$date[2:length(covid19_MA$date)]
# time as integer
covid19_MA_diff$timeInt = as.numeric(covid19_MA_diff$date)
# make a copy to avoid perfect collinearity for mixed effect
covid19_MA_diff$timeIid = covid19_MA_diff$timeInt
# GAMM model
gamMA <- gamm4::gamm4(confirmed ~ s(timeInt, k = 90),
random = ~(1|timeIid),
data = covid19_MA_diff,
family = poisson(link = 'log'))
# obtain fitted value
toPredict = data.frame(time = seq(covid19_MA_diff$date[1],
covid19_MA_diff$date[length(covid19_MA_diff$date)],
by = '1 day'))
toPredict$timeInt = as.numeric(toPredict$time)
# obtain forecast
forecast_covid <- data.frame(exp(do.call(cbind, mgcv::predict.gam(gamMA$gam, toPredict, se.fit=TRUE))))
# access residuals
MA_res <- data.frame(covid19_MA_diff$confirmed - forecast_covid$fit)
# transform into time series
MA_res$time = covid19_MA_diff$date
colnames(MA_res)[1] = "residuals"
col_order <- c("time", "residuals")
MA_res <- MA_res[, col_order]
MA_res_ts <- ts_xts(MA_res)
```
```{r boston top 5 crime VAR, echo=FALSE}
#### Build VAR for BOSTON ####
# specify common time range
# start from when covid was a thing
# end with May 25th the death of George Folyid
common_time <- seq.Date(start(MA_res_ts), as.Date("2020-05-25"), by = "day")
# combine time series of crime and covid
combined_diff <- merge(top5crime_diff[paste(common_time[1],
common_time[length(common_time)],
sep = "/")],
MA_res_ts[paste(common_time[1],
common_time[length(common_time)],
sep = "/")])
# construct VAR
# variable selection based on AIC
optimal_investigate <- VARselect(combined_diff[,c(1,6)], type = 'none', lag.max = 10)
optimal_damage <- VARselect(combined_diff[,c(2,6)], type = 'none', lag.max = 10)
optimal_medical <- VARselect(combined_diff[,c(3,6)], type = 'none', lag.max = 10)
optimal_vandalism <- VARselect(combined_diff[,c(4,6)], type = 'none', lag.max = 10)
optimal_dispute <- VARselect(combined_diff[,c(5,6)], type = 'none', lag.max = 10)
# construct the model based on smallest AIC
VAR_investigate <- VAR(y=as.ts(combined_diff[,c(1,6)]), p=optimal_investigate$selection[1])
VAR_damage <- VAR(y=as.ts(combined_diff[,c(2,6)]), p=optimal_damage$selection[1])
VAR_medical <- VAR(y=as.ts(combined_diff[,c(3,6)]), p=optimal_medical$selection[1])
VAR_vandalism <- VAR(y=as.ts(combined_diff[,c(4,6)]), p=optimal_vandalism$selection[1])
VAR_dispute <- VAR(y=as.ts(combined_diff[,c(5,6)]), p=optimal_dispute$selection[1])
```
### Impulse response function (Vandalism)
```{r irf boston vandalism}
lags = c(1:25)
par(mfrow = c(1,2))
# vandalism
# from crime to covid
irf_vandalism_1 <- irf(VAR_vandalism,
impulse = "vandalism",
response = "residuals",
n.ahead = 24)
# ggplot version
irf_vandalism_1_gg <- data.frame(irf_vandalism_1$irf$vandalism[,1],
irf_vandalism_1$Lower$vandalism[,1],
irf_vandalism_1$Upper$vandalism[,1])
colnames(irf_vandalism_1_gg) <- c("mean", "lower", "upper")
irf_vandalism_1_plot <- ggplot(irf_vandalism_1_gg, aes(x = lags)) +
geom_line(aes(y = mean), color = "black") +
geom_hline(yintercept = 0, color = "blue") +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "lightblue", alpha = 0.5) +
theme_classic() +
ggtitle("How many more daily covid19 cases there will be
after 1 vandalism") +
xlab("Number of days after a vandalism")+
ylab("Number of new covid 19 cases")
# html version
ggplotly(irf_vandalism_1_plot)
```
### Impulse response function (Verbal dispute)
```{r irf boston verbal dispute}
# verbal dispute
# from covid
irf_dispute_2 <- irf(VAR_dispute,
impulse = "residuals",
response = "dispute",
n.ahead = 24)
# ggplot version
irf_dispute_2_gg <- data.frame(irf_dispute_2$irf$residuals[,1],
irf_dispute_2$Lower$residuals[,1],
irf_dispute_2$Upper$residuals[,1])
colnames(irf_dispute_2_gg) <- c("mean", "lower", "upper")
irf_dispute_2_plot <- ggplot(irf_dispute_2_gg, aes(x=lags)) +
geom_line(aes(y = mean), color = "black") +
geom_hline(yintercept = 0, color = "blue") +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "lightblue", alpha = 0.5) +
theme_classic() +
ggtitle("How many more verbal dispute cases per day there will be
after 1 confirmed covid19 case") +
xlab("Number of days after a confimed covid 19 case")+
ylab("Number of verbal dispute cases")
ggplotly(irf_dispute_2_plot)
```
Row {data-height=400 .tabset .tabset-fade }
-------------------------------------
### Boston Crime Daily Summary
```{r boston daily freq}
# daily
# 2020 only
boston_daily <- boston %>%
dplyr::select(date, OFFENSE_DESCRIPTION, YEAR) %>%
filter(OFFENSE_DESCRIPTION %in% head(boston_summary$OFFENSE_DESCRIPTION, 5),
YEAR == 2020) %>%
count(date, OFFENSE_DESCRIPTION) %>%
na.omit() %>%
ggplot(aes(date, n, group = OFFENSE_DESCRIPTION, color = OFFENSE_DESCRIPTION)) +
geom_line() +
facet_free(~OFFENSE_DESCRIPTION, space = "free") +
scale_fill_brewer(palette = "Set1", breaks = rev(levels(boston_summary$OFFENSE_DESCRIPTION[1:5]))) +
theme(legend.title = element_blank()) +
ylab("Cases") +
theme(legend.position = "none")
ggplotly(boston_daily)
# bunch of crime seem to be affected by BLM protests
```
### Boston Crime Summary
```{r boston yty comparison}
# year to year comparison
# exclude 2020-06 due to incomplete info
yty <- boston %>%
dplyr::select(MONTH, y_month, OFFENSE_DESCRIPTION, YEAR) %>%
filter(OFFENSE_DESCRIPTION %in% head(boston_summary$OFFENSE_DESCRIPTION, 5),
y_month != "2020-06") %>%
count(YEAR, MONTH, OFFENSE_DESCRIPTION) %>%
na.omit() %>%
ggplot(aes(x=MONTH, y=n, group = YEAR, color = as.character(YEAR))) +
geom_line() +
facet_free(~OFFENSE_DESCRIPTION, scales = "free") +
ylab("Cases") +
guides(color = guide_legend(reverse = TRUE)) +
theme(legend.title = element_blank())
ggplotly(yty) %>%
layout(legend=list(traceorder='reversed'))
```
### VAR forecast for Investigate Person
```{r boston investigate var}
forecast_investigate <- forecast(VAR_investigate)
forecast_investigate$forecast$investigate %>%
{cbind(actuals = .$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = "Daily forecast for day-to-day change in investigate person cases in Boston",
ylab = "Day-to-day change") %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(labelsSeparateLines=TRUE) %>%
dyRangeSelector() %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1])) %>%
dyLegend(show = "follow")
```
### VAR forecast for Property Damage
```{r boston property damage var}
# property damage
forecast_damage <- forecast(VAR_damage)
forecast_damage$forecast$property.damage %>%
{cbind(actuals = .$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = "Daily forecast for day-to-day change in property damange cases in Boston",
ylab = "Day-to-day change") %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(labelsSeparateLines=TRUE) %>%
dyRangeSelector() %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1])) %>%
dyLegend(show = "follow")
```
### VAR forecast for Medical cases
```{r boston medical var}
# medical attention
forecast_medical <- forecast(VAR_medical)
forecast_medical$forecast$medical %>%
{cbind(actuals = .$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = "Daily forecast for day-to-day change in medical cases in Boston",
ylab = "Day-to-day change") %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(labelsSeparateLines=TRUE) %>%
dyRangeSelector() %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1])) %>%
dyLegend(show = "follow")
```
### VAR forecast for Vandalism
```{r boston vandalism var}
# vandalism
forecast_vandalism <- forecast(VAR_vandalism)
forecast_vandalism$forecast$vandalism %>%
{cbind(actuals = .$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = "Daily forecast for day-to-day change in vandalism cases in Boston",
ylab = "Day-to-day change") %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(labelsSeparateLines=TRUE) %>%
dyRangeSelector() %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1])) %>%
dyLegend(show = "follow")
```
### VAR forecast for Verbal Dispute
```{r boston dispute var}
# verbal dispute
forecast_dispute <- forecast(VAR_dispute)
forecast_dispute$forecast$dispute %>%
{cbind(actuals = .$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = "Daily forecast for day-to-day change in verbal dispute in Boston",
ylab = "Day-to-day change") %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(labelsSeparateLines=TRUE) %>%
dyRangeSelector() %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1])) %>%
dyLegend(show = "follow")
```
Row {data-height=300}
-----------------------------------------------------------------------
### Daily confirmed cases of COVID19 in Massachusetts
```{r massachusetts daily covid case}
dygraph(ts_diff_MA) %>%
dyOptions(fillGraph = TRUE, fillAlpha = 0.4, colors = "red") %>%
dySeries("82a3cbff", label = "Massachusetts") %>%
dyLegend(show = "always", hideOnMouseOut = FALSE)
```
### Summary
This is text [link](http://www.example.com)
- one
- two
- three
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Philadelphia
=======================================================================
Los Angeles
=======================================================================
Atlanta
=======================================================================
### Impulse response function (burglary)
```{r get the data for atlanta}
url1 <- 'https://www.atlantapd.org/Home/ShowDocument?id=3279'
temp <- tempfile()
download.file(url1, temp, mode = 'wb')
zip_data1 <- read.csv(unz(temp, 'COBRA-2020.csv'))
unlink(temp)
# download historical data before 2020
url2 <- 'https://www.atlantapd.org/Home/ShowDocument?id=3051'
temp <- tempfile()
download.file(url2, temp, mode = 'wb')
zip_data2 <- read.csv(unz(temp, 'COBRA-2009-2019.csv'))
unlink(temp)
```
```{r}
zip_data2 <- zip_data2 %>%
filter(substr(Occur.Date, start = 1, stop = 4) >= '2014')
library(lubridate)
zip_data1$occur_date <- format(as.Date(zip_data1$occur_date, "%m/%d/%Y"), '%Y-%m-%d')
zip_data2$UCR.Literal <- gsub('ROBBERY-COMMERCIAL', 'ROBBERY', zip_data2$UCR.Literal)
zip_data2$UCR.Literal <- gsub('ROBBERY-PEDESTRIAN', 'ROBBERY', zip_data2$UCR.Literal)
zip_data2$UCR.Literal <- gsub('ROBBERY-RESIDENCE', 'ROBBERY', zip_data2$UCR.Literal)
zip_data2$UCR.Literal <- gsub('BURGLARY-RESIDENCE', 'BURGLARY', zip_data2$UCR.Literal)
zip_data2$UCR.Literal <- gsub('BURGLARY-NONRES', 'BURGLARY', zip_data2$UCR.Literal)
zip_data2$UCR.Literal <- gsub('LARCENY-FROM VEHICLE', 'LARCENY', zip_data2$UCR.Literal)
zip_data2$UCR.Literal <- gsub('LARCENY-NON VEHICLE', 'LARCENY', zip_data2$UCR.Literal)
colnames(zip_data1) <- c("Report.Number", "Report.Date", "Occur.Date", "Occur.Time", "Possible.Date", "Possible.Time","Beat","Apartment.Office.Prefix","Apartment.Number","Location","MinOfucr","dispo_code","Shift.Occurence","Location.Type","UCR.Literal","IBR.Code","Neighborhood", "NPU", "Latitude","Longitude")
zip_data1 <- zip_data1 %>%
filter(substr(Occur.Date, start = 1, stop = 4) >= '2014')
atlanta <- merge(zip_data1,zip_data2, all = T)
# add date
atlanta <- atlanta %>%
mutate(y_month = substr(Occur.Date, start = 1, stop = 7)) %>%
mutate(YEAR = substr(Occur.Date, start = 1, stop = 4)) %>%
mutate(MONTH = substr(Occur.Date, start = 6, stop = 7))
# summary of all crime
atlanta_summary <- atlanta %>%
group_by(UCR.Literal) %>%
summarise(number_of_crime = n()) %>%
arrange(desc(number_of_crime))
```
```{r extract atlanta crime}
# extract all crimes
top5crime <- atlanta %>%
filter(UCR.Literal %in% head(atlanta_summary$UCR.Literal, 5)) %>%
group_by(Occur.Date, UCR.Literal) %>%
tally() %>%
spread(UCR.Literal, n)
top5crime[is.na(top5crime)] = 0
# rename columns
colnames(top5crime) <- c('time',
'assault',
"autotheft",
"burglary",
"larceny",
'robbery')
# create time series
top5crime_xts <- ts_xts(top5crime[,1:2])
for (i in (3:ncol(top5crime))){
temp_xts <- ts_xts(top5crime[, c(1,i)])
top5crime_xts <- merge(top5crime_xts, temp_xts)
}
# extract difference, change per day
top5crime_diff <- na.omit(diff(top5crime_xts))
```
```{r top 5 crime VAR}
# extract for tranforming into time series data
ts_AT <- covid19_AT %>%
dplyr::select(date, confirmed) %>%
ts_xts()
# try first log difference
ts_diff_AT <- diff(ts_AT)
adj_diff_AT <- na.omit(ts_diff_AT[,1] + 10)
covid19_AT_diff <- data.frame(diff(covid19_AT$confirmed) + 10)
colnames(covid19_AT_diff)[1] = "confirmed"
covid19_AT_diff$date = covid19_AT$date[2:length(covid19_AT$date)]
# time as integer
covid19_AT_diff$timeInt = as.numeric(covid19_AT_diff$date)
# make a copy to avoid perfect collinearity
covid19_AT_diff$timeIid = covid19_AT_diff$timeInt
# GAMM model
# 50 too overfit. 15 looks decent
gamAT <- gamm4::gamm4(confirmed ~ s(timeInt, k=90), random = ~(1|timeIid),
data=covid19_AT_diff, family=poisson(link='log'))
# looks like random intercept is making little difference.
# choose to not have random effect to preserve it for time series analysis
# plot fitted value
toPredict = data.frame(time = seq(covid19_AT_diff$date[1],
covid19_AT_diff$date[length(covid19_AT_diff$date)],
by = '1 day'))
toPredict$timeInt = as.numeric(toPredict$time)
# obtain forecast
forecast <- data.frame(exp(do.call(cbind, mgcv::predict.gam(gamAT$gam, toPredict, se.fit=TRUE))))
# access residuals
AT_res <- data.frame(covid19_AT_diff$confirmed - forecast$fit)
# transform into time series
AT_res$time = covid19_AT_diff$date
colnames(AT_res)[1] = "residuals"
col_order <- c("time", "residuals")
AT_res <- AT_res[, col_order]
AT_res_ts <- ts_xts(AT_res)
```
```{r}
common_time <- seq.Date(start(AT_res_ts), as.Date("2020-05-25"), by = "day")
# combine time series of crime and covid
combined_diff <- merge(top5crime_diff[paste(common_time[1],
common_time[length(common_time)],
sep = "/")],
AT_res_ts[paste(common_time[1],
common_time[length(common_time)],
sep = "/")])
```
```{r construct atlanta var, warning = FALSE}
# variable selection based on AIC
optimal_assault <- VARselect(na.omit(combined_diff)[,c(1,6)], type = 'none', lag.max = 10)
optimal_autotheft <- VARselect(na.omit(combined_diff)[,c(2,6)], type = 'none', lag.max = 10)
optimal_burglary <- VARselect(na.omit(combined_diff)[,c(3,6)], type = 'none', lag.max = 10)
optimal_larceny <- VARselect(na.omit(combined_diff)[,c(4,6)], type = 'none', lag.max = 10)
optimal_robbery <- VARselect(na.omit(combined_diff)[,c(5,6)], type = 'none', lag.max = 10)
# use AIC as selection criteria
VAR_assault <- VAR(y=as.ts(na.omit(combined_diff)[,c(1,6)]), p=optimal_assault$selection[1])
VAR_autotheft <- VAR(y=as.ts(na.omit(combined_diff)[,c(2,6)]),
p=optimal_autotheft$selection[1])
VAR_burglary <- VAR(y=as.ts(na.omit(combined_diff)[,c(3,6)]),
p=optimal_burglary$selection[1])
VAR_larceny <- VAR(y=as.ts(na.omit(combined_diff)[,c(4,6)]),
p=optimal_larceny$selection[1])
VAR_robbery <- VAR(y=as.ts(na.omit(combined_diff)[,c(5,6)]),
p=optimal_robbery$selection[1])
```
```{r atlanta irf}
lags = c(1:25)
# only covid significant to burglary
irf_burglary_1 <- irf(VAR_burglary,
impulse = "residuals",
response = "burglary",
n.ahead = 24)
# ggplot
irf_burglary_1_gg <- data.frame(irf_burglary_1$irf$residuals[,1],
irf_burglary_1$Lower$residuals[,1],
irf_burglary_1$Upper$residuals[,1])
colnames(irf_burglary_1_gg) <- c("mean", "lower", "upper")
irf_burglary_1_plot <- ggplot(irf_burglary_1_gg, aes(x=lags)) +
geom_line(aes(y = mean), color = "black") +
geom_hline(yintercept = 0, color = "blue") +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "lightblue", alpha = 0.5) +
theme_classic() +
ggtitle("How many more burglary cases per day there will be
after 1 confirmed covid19 case") +
xlab("Number of days after a confimed covid 19 case")+
ylab("Number of bulglary cases")
ggplotly(irf_burglary_1_plot)
```
Row {data-height=400 .tabset .tabset-fade }
-------------------------------------
### Atlanta Crime Summary
```{r}
# year to year comparison
plt <- atlanta %>%
dplyr::select(y_month, MONTH, UCR.Literal, YEAR) %>%
filter(UCR.Literal %in% atlanta_summary$UCR.Literal[1:5], y_month != "2020-06") %>%
count(YEAR, MONTH, UCR.Literal) %>%
na.omit() %>%
ggplot(aes(x=MONTH, y=n, group = YEAR, color = as.character(YEAR))) +
geom_line() +
facet_wrap(~UCR.Literal,nrow = 1) +
guides(color = guide_legend(reverse = TRUE)) +
xlab('Month') +
ylab('Cases') +
labs(col='Year')
ggplotly(plt) %>%
layout(legend=list(traceorder='reversed'))
```
### VAR Forecast for burglary
```{r custom function}
interval_value_formatter <- "function(num, opts, seriesName, g, row, col) {
value = g.getValue(row, col);
if(value[0] != value[2]) {
lower = Dygraph.numberValueFormatter(value[0], opts);
upper = Dygraph.numberValueFormatter(value[2], opts);
return '[' + lower + ', ' + upper + ']';
} else {
return Dygraph.numberValueFormatter(num, opts);
}
}"
```
```{r}
f_burglary <- forecast::forecast(VAR_burglary)
f_burglary$forecast$burglary %>%
{cbind(actuals=.$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = 'Prediction on how many more burglary cases
compared to yesterday',
ylab = 'Day-to-day change') %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(show = 'follow') %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```
### VAR Forecast for assault
```{r}
f_assault <- forecast::forecast(VAR_assault)
f_assault$forecast$assault %>%
{cbind(actuals=.$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = 'Prediction on how many more aggregate assault cases
compared to yesterday',
ylab = 'Day-to-day change') %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(show = 'follow') %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```
### VAR Forecast for auto theft
```{r}
f_autotheft <- forecast::forecast(VAR_autotheft)
f_autotheft$forecast$autotheft %>%
{cbind(actuals=.$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = 'Prediction on how many more auto theft cases
compared to yesterday',
ylab = 'Day-to-day change') %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(show = 'follow') %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```
### VAR Forecast for larceny
```{r}
f_larceny <- forecast::forecast(VAR_larceny)
f_larceny$forecast$larceny %>%
{cbind(actuals=.$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = 'Prediction on how many more larceny cases
compared to yesterday',
ylab = 'Day-to-day change') %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(show = 'follow') %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```
### VAR Forecast for robbery
```{r}
f_robbery <- forecast::forecast(VAR_robbery)
f_robbery$forecast$robbery %>%
{cbind(actuals=.$x, forecast_mean=.$mean,
lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
dygraph(main = 'Prediction on how many more robbery cases
compared to yesterday',
ylab = 'Day-to-day change') %>%
dyAxis("y", valueFormatter = interval_value_formatter) %>%
dySeries("actuals", color = "black") %>%
dySeries("forecast_mean", color = "blue", label = "forecast") %>%
dySeries(c("lower_80", "forecast_mean", "upper_80"),
label = "80%", color = "blue") %>%
dySeries(c("lower_95", "forecast_mean", "upper_95"),
label = "95%", color = "blue")%>%
dyLegend(show = 'follow') %>%
dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```
Seattle
=======================================================================